Marta Karas
February 12, 2019
RStudio conference vs UseR! conference
gganimate (with ggplot intro) for animated plots
gt to turn a data table into “information-rich, publication-quality” table outputs
pagedown to get paged HTML documents
Note: A number of screenshots/images/citations are using across this presentation. I provide reference on the last slides.
RStudio 2019 conference
January 15-18, 2019 | Austin, TX, USA
Talks focused on RStudio, Inc. products: RStudio, Shiny, R packages, RStudio Server, RStudio Connect
Call for poster submission only
Diversity scholarships were available (😏not successful applicants got conf fee discount)
useR! 2019 conference
July 9-12, 2019 | Toulouse, France
Topics seem more academia work-welcoming (Last year one: “Using mommix for fast, large-scale genome-studies in the presence of gene-environment and gene-gene interaction”)
Call for submissions: tutorial, oral presentation, lighting talks, posters (deadline: Jan 18 for tutorials, Mar 1, 2019 for the other submissions)
Diversity scholarships are available (Deadline: Mar 1, 2019)
I wish I had done this:
Extension to ggplot2, provides “implementation of the grammar of animated graphics”
Returns a gif_image object which is a simple wrapper around a path to a gif file
Available on CRAN: install.packages("gganimate").
Development version can be installed from GitHub ("thomasp85/gganimate")
ggplot2 - R package to create relatively complicated plots in a relatively simple way
Uses “grammar of graphics”, that is, “tells us that a statistical graphic is a mapping from data to aesthetic attributes (colour, shape, size) of geometric objects (points, lines, bars)”
To make graphics using ggplot2, the data needs to be in a tidy format
Tidy data:
Each variable forms a column.
Each observation forms a row.
Each type of observational unit forms a table.
Messy data:
Column headers are values, not variable names.
Multiple variables are stored in one column.
Variables are stored in both rows and columns.
Multiple types of observational units are stored in the same table.
A single observational unit is stored in multiple tables.
Each variable forms a column. Each observation forms a row.
Column headers are values, not variable names
Read more about tidy data and see other examples: Tidy Data by Hadley Wickham
Draw line, mark points.
######################################################################
## Make some foo data frame
df <- data.frame(time = 1:10, value = rnorm(10))
## Plot
library(ggplot2)
ggplot(df, aes(x = time, y = value)) +
geom_point() +
geom_line()
Draw line, mark points, modify line and points look.
#################################################################################
ggplot(df, aes(x = time, y = value)) +
geom_point(size = 5,
color = "red") +
geom_line(linetype = 2,
color = "brown",
size = 0.8) +
theme_bw(base_size = 20) +
labs(x = "My x axis label",
y = "My y axis label",
title = "My plot title")
#################################################################################
## Make some foo data frame
##
## - 20 different items in 2 different categories
## - value time series for each item
## - 100 time points of data collection for each item
##
set.seed(1)
time <- as.vector(replicate(20, 1:100))
item <- as.vector(sapply(1:20, function(i) rep(i, 100)))
value <- as.vector(replicate(20, cumsum(rnorm(100))))
categ <- as.vector(sapply(1:20, function(i) rep(sample(c("A", "B"), 1), 100)))
df <- data.frame(time, item, value, categ)
str(df)
'data.frame': 2000 obs. of 4 variables:
$ time : int 1 2 3 4 5 6 7 8 9 10 ...
$ item : int 1 1 1 1 1 1 1 1 1 1 ...
$ value: num -0.626 -0.443 -1.278 0.317 0.646 ...
$ categ: Factor w/ 2 levels "A","B": 1 1 1 1 1 1 1 1 1 1 ...
Grouping lines by item variable.
#############################################
ggplot(df,
aes(x = time,
y = value,
group = item)) +
geom_line() +
theme_grey(base_size = 20)
Grouping lines by item, colouring lines by categ variable.
#############################################
ggplot(df,
aes(x = time,
y = value,
group = item,
color = categ)) +
geom_line() +
labs(color = "Category: ") +
theme_grey(base_size = 20)
Grouping boxplots by item variable.
#############################################
ggplot(df,
aes(x = item, y = value,
group = item)) +
geom_boxplot() +
labs(x = "Item ID",
y = "Value") +
theme_grey(base_size = 20)
Grouping boxplots by item variable, filling with color by categ variable. Use alpha to make boxplot fill transparent.
#############################################
ggplot(df,
aes(x = item, y = value,
group = item,
fill = categ)) +
geom_boxplot(alpha = 0.3) +
labs(x = "Item ID",
y = "Value",
fill = "Category: ") +
theme_grey(base_size = 20) +
theme(legend.position = "top")
####################################################################
## Make some new foo data
categ <- sample(c("cat_A", "cat_B"), 1000, replace = TRUE)
id <- as.vector(replicate(1000/4, paste0("ID_", 1:4)))
value <- rnorm(1000)
df <- data.frame(categ, id, value)
ggplot(df, aes(x = value)) +
geom_histogram(fill = "yellow", color = "black") +
facet_grid(id ~ .) +
labs(x = "Value", y = "Count") +
theme_grey(base_size = 20)
####################################################################
ggplot(df, aes(x = value)) +
geom_histogram(fill = "blue", color = "black", alpha = 0.1) +
facet_grid(id ~ categ) +
labs(x = "Value", y = "Count") +
theme_bw(base_size = 20)
See also: facet_wrap.
Without gganimate:
Boxplot of Miles/(US) gallon, stratified by Number of cylinders (x-axis), and by # of gears (horizontal panels split).
###############################################
plt <-
ggplot(mtcars, aes(factor(cyl), mpg)) +
geom_boxplot() +
facet_grid(gear ~ .) +
labs(x = 'Number of cylinders',
y = 'Miles/(US) gallon') +
theme_grey(base_size = 20)
library(gganimate)
ggplot(mtcars, aes(factor(cyl), mpg)) +
geom_boxplot() +
# below: gganimate code
transition_states(
gear,
transition_length = 0.5,
state_length = 0.5
) +
enter_fade() +
exit_shrink() +
ease_aes('sine-in-out') +
labs(title = 'Gear: {closest_state}',
x = 'Number of cylinders',
y = 'Miles/(US) gallon')
transition_states() defines splits data into multiple states enter_fade(), exit_shrink() define a way to handle a lack of data state case ease_aes() defines a manner in which a value change to another (will it progress linearly, or maybe start slowly and then build up momentum?) library(gapminder)
ggplot(gapminder,
aes(gdpPercap, lifeExp,
size = pop, colour = country)) +
geom_point(alpha = 0.7, show.legend = FALSE) +
scale_colour_manual(values = country_colors) +
scale_size(range = c(2, 12)) +
scale_x_log10() +
facet_wrap(~ continent, ncol = 2) +
# Here comes the gganimate specific code
transition_time(year) +
labs(title = 'Year: {frame_time}',
x = 'GDP per capita',
y = 'life expectancy') +
ease_aes('linear')
transition_time() is a variant of transition_states() that is intended for data where the states are representing specific point in time; transition length between the states will be set to correspond to the actual time difference between themgt philosophy: construct a wide variety of tables with a cohesive set of table partstibble or a data.frame)gt object with the elements you need for the task at handsp500: data of daily price indicators for the S&P 500 index from 1950 to 2015
HTML output produced via printing data frame
# devtools::install_github("rstudio/gt")
library(tidyverse)
library(gt)
start_date <- "2010-06-07"
end_date <- "2010-06-14"
out1 <-
sp500 %>%
filter(date >= start_date & date <= end_date) %>%
select(-adj_close) %>%
mutate(date = as.character(date))
out1
out1 %>%
gt() %>%
tab_header(
title = "S&P 500",
subtitle = glue::glue("{start_date} to {end_date}")
) %>%
fmt_date(
columns = vars(date),
date_style = 3
) %>%
fmt_currency(
columns = vars(open, high, low, close),
currency = "USD"
) %>%
fmt_number(
columns = vars(volume),
scale_by = 1 / 1E9,
pattern = "{x}B"
)
tab_header - add a table header with a title (and subtitle)fmt_date - format date values according to certain stylefmt_currency - do currency-based formatting with fine contol optionsfmt_number - do number-based formatting so that the targeted values are rendered with a “higher consideration for tabular presentation”gt that make it possible to create highly customized tables; see package website: https://github.com/rstudio/gtkable()library(knitr)
library(kableExtra)
out1 %>%
kable() %>%
kable_styling(bootstrap_options = c("striped"),
font_size = 15) %>%
column_spec(6, bold = TRUE,
background = "yellow") %>%
row_spec(4:5, color = "white",
background = "#D7261E")
Yihui Xie: main author of knitr R package and R Markdown document format
Markdown - “(1) a plain text formatting syntax” designed to be as readable as possible
R Markdown = Markdown + R code chunks
knitr - executes the computer code embedded in Markdown, and converts R Markdown to Markdown
Pandoc: renders Markdown to the output format you want (PDF, HTML, Word etc)
Talk topic: pagedown package
From package website:
Paginate the HTML Output of R Markdown with CSS for Print. You only need a modern web browser (e.g., Google Chrome) to generate PDF. No need to install LaTeX to get beautiful PDFs.
Also:
Description: Use the paged media properties in CSS and the JavaScript library 'paged.js' to split the content of an HTML document into discrete pages. Each page can have its page size, page numbers, margin boxes, and running headers, etc. Applications of this package include books, letters, reports, papers, business cards, resumes, and posters.
The gganimate materials (description, gifs, R code examples) were sourced from gganimate package website: https://gganimate.com/.
The gt materials (description, images, R code examples) were sourced from gt package website: https://github.com/rstudio/gt.
Text from “Quick intro to ggplot2 package” slide text was spartially ourced from Jeff Leek materials available here.
Text from “Tidy vs messy data” slide, and tables screenshots from slides “Tidy data: Examples”, “Messy data: Example”, were sourced from Tidy Data paper by Hadley Wickham available here.
pagedown package logo, text and examples were sourced from package documentation and (mostly) presentation by Yihui Xie and Romain Lesur, available here.